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ADAPTIVE  REFINEMENT  TREE  -  A  NEW  HIGH-RESOLUTION 
iV-BODY  CODE  FOR  COSMOLOGICAL  SIMULATIONS 


1.  Introduction 

iV-body  techniques  are  used  in  cosmological  simulations  to  follow  the  non-linear  evolution  of  a  system 
of  particles,  and  to  give  theoretical  predictions  about  the  matter  distribution  which  can  be  compared  with 
observations.  The  traditional  iV^-body  methods  are  Particle-Mesh  (PM),  Particle-Particle/Particle-Mesh 
(P^M),  and  TREE  (Hockney  &  Eastwood  1981;  Klypin  &  Shandarin  1983;  Efstathiou  et  al.  1985;  Bouchet 
k  Hernquist  1988;  and  references  therein).  Although  these  algorithms  proved  to  be  useful,  with  currently 
available  computers  all  of  them  fall  short  of  the  spatial  or  mass  resolution  desired  in  cosmological  simulations. 
For  example,  one  needs  a  resolution  ^1  —  10  kpc  to  resolve  a  galaxy  and  a  simulation  cube  of  ^  100  Mpc  to 
sample  appropriately  the  longest  perturbation  waves  or  to  get  sufficient  statistics.  The  required  dynamical 
range  is  thus  10^  —  10®  which  is  much  higher  than  current  codes  can  allow.  These  limitations  have 
motivated  the  development  of  new  methods  with  better  resolution  and/or  performance. 

Villumsen  (1989)  developed  a  code  where  the  PM  grid  was  complemented  by  finer  cubic  subgrids  to 
increase  the  force  resolution  in  regions  of  interest.  The  local  potential  was  calculated  as  a  sum  of  the 
potentials  on  the  subgrids  and  on  the  PM  grid.  A  similar  approach  was  adopted  by  Jessop,  Duncan  k  Chau 
(1994)  in  their  particle-multiple-mesh  code.  However,  instead  of  summing  the  potentials  from  subgrids,  the 
potential  on  each  level  was  obtained  independently  by  solving  the  boundary  problem.  Boundary  values  of 
the  potential  were  interpolated  from  the  coarser  parent  grid.  Couchman  (1991)  used  cubic  refinement  grids 
to  improve  the  performance  of  the  P^M  algorithm.  Here,  the  resolution  was  retained  at  the  level  of  the 
P^M  code  while  the  computational  speed  was  considerably  increased.  In  the  Lagrangian  approach  (Gnedin 
1995;  Pen  1995)  the  computational  mesh  is  not  static  but  moves  with  the  matter  so  that  the  resolution 
increases  (smaller  mesh  cells)  in  the  high  density  regions  and  decreases  elsewhere.  Although  potentially 
powerful,  this  approach  has  its  caveats  and  drawbacks  (Gnedin  k  Bertschinger  1996).  The  mesh  distortions, 
for  example,  may  introduce  severe  force  anisotropies.  A  different  approach  was  adopted  by  Xu  (1995),  who 
developed  the  TPM  code  -  a  hybrid  of  the  PM  and  TREE  algorithms.  The  gravitational  forces  in  the  TPM 
are  calculated  via  a  PM  scheme  on  the  grid  and  via  multipole  expansions  (TREE  algorithm)  in  the  regions 
where  higher  force  resolution  is  desired.  The  forces  on  the  particles  in  low-density  regions  are  calculated  by 
the  PM  scheme,  while  forces  on  the  particles  in  high-density  regions  are  sum  of  external  large-scale  PM  force 
and  internal  short-scale  force  from  the  neighboring  particles.  Although  this  code  may  not  be  faster  than  a 
pure  TREE  code,  it  is  effectively  parallel  because  particles  in  different  regions  can  be  evolved  independently. 
An  adaptive  multigrid  code  for  cosmological  simulations  was  recently  presented  by  Suisalu  k  Saar  (1995). 
In  this  code,  finer  rectangular  subgrids  are  adaptively  introduced  in  regions  where  the  density  exceeds  a 
specified  threshold.  For  each  subgrid  the  potential  is  calculated  using  boundary  conditions  interpolated  from 
the  coarser  grid.  The  solution  on  the  finer  grid  is  used  to  improve  the  solution  on  the  coarser  grid.  Another 
variant  of  an  adaptive  particle-multiple-mesh  code  for  cosmological  simulations  was  recently  presented  by 
Gelato,  Chernoff,  k  Wasserman  (1996).  This  code  can  handle  isolated  boundary  conditions  which  makes  it 
applicable  to  non-cosmological  problems. 

All  of  the  above  muliigrid  methods  use  rectangular  subgrids  to  increase  force  resolution.  For  simulations 
where  there  are  only  a  few  small  regions  of  interest  (e.g.,  a  few  galaxies  or  clusters  of  galaxies)  the  rectangular 
refinements  may  be  a  good  choice  because  these  regions  can  be  easily  covered  by  rectangular  subgrids.  It  is, 
however,  well-known  that  the  geometry  of  structures  in  realistic  cosmological  models  is  usually  a  complicated 
network  of  sheets,  filaments,  and  clumps  which  are  difficult  to  cover  efficiently  with  rectangular  grids. 

In  this  paper  we  present  a  new  Adaptive  Refinement  Tree  (ART)  high-resolution  A^-body  code.  This 
code  was  developed  to  improve  the  spatial  resolution  of  particle-mesh  code  by  about  two  orders  of  magnitude 
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without  loss  in  mass  resolution  and  computational  speed.  In  our  scheme,  the  computational  volume  is  covered 
by  a  cubic  rectangular  grid  which  defines  the  minimum  resolution.  On  this  grid,  the  Poisson  equation  is  solved 
with  a  traditional  Fast  Fourier  Transform  (FFT)  technique  using  periodic  boundary  conditions.  The  finer 
meshes^  are  built  as  collections  of  cubic,  non-overlapping  cells  of  various  sizes  organized  in  octal  threaded 
trees  in  regions  where  the  density  exceeds  a  predefined  threshold.  Any  mesh  can  be  subject  to  further 
refinements;  the  local  refinement  process  stops  when  density  criterion  is  satisfied.  Once  constructed,  the 
mesh  is  not  destroyed  at  every  time  step  but  promptly  adjusted  to  the  evolving  particle  distribution.  To 
solve  the  Poisson  equation  on  these  refinement  meshes,  we  use  a  relaxation  method  with  boundary  conditions 
and  initial  solution  guess  interpolated  from  the  previous  coarser  mesh.  Below  we  present  the  method  (Section 
2),  describe  the  code  (Section  3),  and  discuss  the  tests  (Section  4).  We  then  compare  the  code  with  other 
algorithms  (Section  4),  and  finally  apply  it  to  a  real  cosmological  problem  (Section  5). 


2,  Methodology 
2,1.  Adaptive  mesh  refinement 

Adaptive  mesh  refinement  (AMR)  techniques  for  solving  partial  differential  equations  (PDFs)  have 
numerous  applications  in  different  fields  of  physics,  astrophysics,  and  engineering  where  large  dynamic  range 
is  important.  There  are  two  major  approaches  in  application  of  these  techniques.  In  the  first  approach 
(e.g.,  Berger  &  Oliger  1984;  Berger  1986;  Berger  &  Colella  1989),  the  computational  volume  is  divided  in 
cubic  elements  (cells),  while  in  the  second  (e.g.,  Lohner  &  Baum  1991)  the  cells  can  have  an  arbitrary  shape. 
Collections  of  cells  are  used  as  computational  meshes  on  which  the  PDFs  are  discretized.  We  will  call  meshes 
composed  of  cubic  cells  regular^  calling  meshes  irregular  otherwise.  The  integration  of  PDFs  is  simpler  on 
regular  meshes  but  dealing  with  complicated  boundaries  may  be  a  difficult  problem.  With  irregular  meshes 
one  can  handle  complicated  boundaries  much  more  easily.  The  price,  however,  is  more  elaborate  algorithms, 
data  structures,  and  associated  CPU  and  memory  overhead.  A  particular  choice  of  the  mesh  structure  is 
a  tradeoff  between  these  considerations.  In  astrophysics  there  are  no  complicated  boundaries,  and  a  cubic 
computational  volume  is  usually  used  to  model  a  system.  In  this  case,  there  is  no  need  in  irregular  meshes 
and  it  is  preferable  to  use  meshes  made  of  cubic  cells. 

The  regular  meshes  themselves  can  be  organized  in  different  ways.  The  usual  practice  is  to  use  regular 
meshes  of  cubic  or  rectangular  shape  (e.g.,  Berger  &  Colella  1989)  organized  in  arrays  (grids).  This  allows 
one  to  simplify  data  structures  and  to  use  standard  PDF  solvers.  These  arrays  can  be  organized  in  a  tree 
(Berger  1986)  to  form  a  multigrid  hierarchy.  The  main  disadvantage  of  the  grids  is  that  one  cannot  cover 
regions  of  complicated  shape  in  an  efficient  way.  Moreover,  the  arrays  are  unflexible  data  structure,  and 
the  whole  refinement  hierarchy  should  be  periodically  rebuilt,  not  adjusted,  when  dealing  with  unsteady 
solutions. 

In  our  approach,  we  use  regular  meshes  but  they  are  handled  in  a  completely  different  way.  Cells  are 
treated  as  individual  units  which  are  organized  in  refinement  trees  (see  Section  3.2).  Fach  tree  has  a  root 
“  a  cell  belonging  to  a  base  cubic  grid  which  covers  the  entire  computational  volume.  If  the  root  is  refined 
(split)  -  it  has  8  children  (smaller  non-overlapping  cubic  cells  residing  in  its  volume)  which  in  their  turn  can 
be  refined,  and  so  on.  Cells  of  a  given  refinement  level  are  organized  in  linked  lists  and  form  a  refinement 


^  We  will  iise  word  grid  to  refer  to  the  cubic  or 
arbitrary  shape. 


rectangular  configurations,  reserving  the  word  mesh  for  configurations  of 
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mesh.  The  tree  data  structures  make  mesh  storage  and  access  in  memory  logical  and  simple,  while  linked 
lists  allow  for  efficient  mesh  structure  traversals.  In  the  current  version  of  the  code  we  make  use  of  octal 
threaded  trees  (Khokhlov  1997)  and  doubly  linked  lists  (e.g.,  Knuth  1968;  Aho,  Hopcroft,  &  Ulman  1983; 
Corner,  Leiserson,  &  Rivest  1994).  The  fact  that  cells  are  treated  as  independent  units  rather  than  element 
of  a  grid  allows  us  to  build  a  very  flexible  mesh  hierarchy  which  can  be  easily  modified.  The  details  of  the 
mesh  generation  and  modification  in  our  code  are  described  in  Section  3, 


2.2.  Multilevel  relaxation  method 

The  multigrid  techniques  of  solving  partial  differential  equations  (Brandt  1977)  are  very  successful  in 
reducing  the  computational  and  storage  requirements  for  solving  many  types  of  PDEs  (Wesseling  1992,  Press 
et  al.  1992).  There  are  two  kinds  of  multigrid  algorithms.  The  first,  sometimes  called  muliigrid  method^ 
is  used  to  speed  up  convergence  of  relaxation  method.  In  this  method,  the  source  term  is  defined  only  on 
the  base  finest  grid  -  all  the  other  coarser  grids  are  used  as  a  workspace.  In  the  second  algorithm,  called 
full  muliigrid,  the  source  term  is  defined  on  all  grids,  and  the  method  obtains  successive  solutions  on  finer 
and  finer  grids.  The  latter  method  is  useful  when  dealing  with  grids  created  in  adaptive  refinement  process. 
The  full  multigrid  scheme  in  its  turn  can  be  used  differently  depending  on  how  the  solutions  on  different 
levels  influence  each  other.  In  the  one-way  interface  scheme,  the  solution  from  a  coarser  grid  is  used  to  get 
a  first  guess  solution  on  the  finer  grid,  and  often  to  get  boundary  values  as  well.  However,  the  solution  on 
the  coarser  grid  is  not  influenced  by  the  solution  on  the  finer  grid  (e.g.,  Jessop,  Duncan,  &  Chau  1994).  In 
the  two-way  interface  scheme,  the  coarser  grid  solution  is  used  to  correct  the  solution  on  the  finer  grids  and 
vice-versa.  The  choice  of  a  particular  scheme  is  usually  determined  empirically,  and  is  problem  dependent. 
The  two-way  interface  scheme  is  more  difficult  to  implement  in  the  case  of  periodic  boundary  conditions 
(Suisalu  k  Saar  1996). 

In  our  approach,  each  refinement  mesh  is  composed  of  cells  of  the  same  refinement  level  but  these 
meshes  are  completely  different  from  grids.  The  techniques  are  thus  multilevel  rather  than  multigrid.  We 
use  an  analog  of  full  multigrid  algorithm  with  the  one-way  interface  between  the  meshes.  We  use  a  regular 
cubic  grid  covering  the  whole  computational  volume  as  the  zeroth,  coarsest  level.  At  this  level,  the  Poisson 
equation  is  solved  using  a  standard  FFT  method  with  periodic  boundary  conditions.  This  solution  is  then 
interpolated  on  to  the  first  level  finer  mesh  to  get  the  boundary  values  and  first  solution  guess.  Once  the 
boundary  problem  is  defined,  we  use  a  relaxation  method  (e.g..  Press  et  al.  1992)  to  solve  the  Poisson 
equation  on  this  mesh.  As  we  start  from  an  initial  guess  which  is  already  close  to  the  final  solution,  the 
iterative  relaxation  procedure  converges  fast.  After  we  get  the  solution  on  the  first  refinement  level,  the  same 
procedure  (obtaining  boundary  values  and  initial  guess  by  interpolation  from  the  previous  coarser  level)  is 
repeated  for  the  next  level,  and  so  forth.  At  the  end  of  this  process  we  have  the  solution  (potential)  for  all 
cells.  The  description  of  the  code  is  given  in  the  next  section. 


3.  Description  of  the  code 
3.1.  Code  structure 

The  structure  of  the  code  can  be  outlined  as  follows.  First  of  all,  we  set  up  the  initial  positions  and 
velocities  of  the  particles  using  the  Zel’dovich  approximation  as  described  by  Klypin  k  Shandarin  (1983). 
Once  the  initial  conditions  are  set  we  construct  the  regular  cubic  grid  covering  the  whole  computational 
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volume  and  then  proceed  to  check  if  additional  refinement  levels  are  required  according  to  the  current 
density  threshold.  At  this  point  the  code  enters  the  main  computational  loop  which  includes: 

•  density  assignment  on  all  existing  meshes; 

•  gravitational  solver; 

•  routine  updating  particle  positions  and  velocities; 

•  modifications  to  the  mesh  hierarchy. 

The  mesh  modifications  (refinement  and  derefinement)  are  based  on  the  density  distribution^.  The  modifi¬ 
cations  are  made  at  the  end  of  the  computational  cycle.  At  this  point  the  density  distribution  is  available 
(it  was  calculated  for  the  gravitational  solver). 

Below  we  will  describe  each  of  these  major  functional  blocks  in  detail.  We  will  also  discuss  memory 
requirements  of  the  code,  timing,  and  energy  conservation. 


3.2.  Mesh  generator. 

The  adaptive  mesh  refinement  block  of  the  code  generates  new  and  modifies  existing  meshes.  The 
refinement  hierarchy  in  our  implementation  is  based  on  the  regular  cubic  grid  which  covers  the  entire  com¬ 
putational  volume.  With  the  refinement  block  turned  off,  the  density  assignment  and  gravity  solver  on  this 
grid  are  similar  to  those  in  the  PM  code  of  Kates,  Kotok,  &  Klypin  (1991). 

The  data  structures  that  we  use  to  organize  the  mesh  cells  are  very  similar  to  those  implemented  in  the 
hydrodynamical  Eulerian  Tree  Refinement  code®  (Khokhlov  1997).  All  mesh  cells  are  organized  in  refinement 
trees,  A  cell  can  be  a  parent  of  eight  children  -  smaller  cubic  cells  of  equal  volume  residing  in  it.  Each  child 
may  be  in  its  turn  split  and  have  children.  Each  tree  has  a  root  -  a  zero  level  cell  -  which  may  be  the  only 
cell  in  this  tree  if  it  is  unsplit.  The  tree  ends  with  unsplit  cells,  which  we  call  leaves.  This  structure  is  called 
an  octal  rooted  tree  ~  the  construct  which  is  used  in  TREE  codes.  There  is,  however,  an  important  difference. 
We  use  fully  threaded  trees^  that  is  trees  which  are  connected  with  each  other  on  all  levels^.  Note,  that  we  can 
consider  all  cells  as  belonging  to  a  single  threaded  tree  with  a  root  being  the  entire  computational  domain 
and  the  base  grid  being  one  of  the  tree  levels.  The  tree  structure  is  supported  through  a  set  of  pointers. 
Each  cell  has  a  pointer  to  its  parent  and  a  pointer  to  its  first  child.  In  addition,  cells  has  pointers  to  the 
six  adjacent  cells  (these  make  the  tree  fully  threaded)  so  that  information  about  celPs  neighbors  is  easily 
accessible  (see  Fig.l).  Overall,  the  following  information  is  provided  for  each  cell  i  belonging  to  a  tree: 

♦  Level{i)  -  level  of  the  cell  in  the  tree; 

♦  Pareni{i)  -  pointer  to  the  parent  cell; 


^The  density  criterion  in  our  case  is  a  natur£d  choice  because  we  aim  to  resolve  high  density  regions.  We  could  use,  however, 
any  other  appropriate  criterion,  e.g.  local  potential  gradient,  force  accuracy  etc. 

^ There  are,  however,  some  Import ctnt  modifications  which  were  required  by  specifics  of  the  cosmological  simulations. 

^  hi  simple  threaded  trees  only  leaves  are  connected. 
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•  Child{i)  -  pointer  to  the  cell’s  first  child  or  nil  if  the  cell  is  a  leaf; 

•  Nb{iyj)  -  pointers  to  the  neighbors  (j  =  1  6); 

•  Pos{ijj)  -  position  in  space  ( j  =  1  — ►  3); 

•  V’ar(r,  n)  -  storage  for  associated  physical  variables  (in  our  case  n  =  2,  as  we  keep  both  density  and 
potential). 

The  above  set  of  pointers  is  sufficient  to  support  the  tree  structure  and  to  change  it  dynamically  with 
minimum  cost.  In  addition,  the  cells  on  each  level  of  the  mesh  hierarchy  are  organized  in  doubly  linked 
lists^  (e.g.,  Knuth  1968)  so  that  a  sweep  through  a  given  level  (the  operation  used  extensively  in  multigrid 
relaxations  described  below)  can  be  done  with  minimum  CPU  time.  The  cells  belonging  to  the  base  regular 
grid  (level  zero),  while  part  of  the  same  data  structure,  are  created  only  at  the  very  beginning  of  a  simulation 
and  are  never  destroyed.  It  is  therefore  unnecessary  to  keep  information  about  a  cell’s  position  or  pointers 
to  its  neighbors  because  they  can  be  easily  computed.  The  number  of  pointers  can  be  considerably  reduced 
(by  as  much  as  a  factor  of  2)  because  some  of  them  can  be  shared  by  siblings  (the  eight  cells  which  have  the 
same  parent). 

An  elementary  refinement  process  creates  8  new  cubic  cells  of  equal  volume  (children)  inside  a  parent 
cell.  When  the  parent  is  refined,  we  check  if  all  six  neighbors  are  of  the  same  level  as  the  parent.  If 
there  are  neighbors  of  smaller  level  (coarser)  than  the  parent,  we  split  them.  If  a  neighbor  in  its  turn  has 
coarser  neighbors,  we  split  the  neighbor’s  neighbors,  and  so  forth.  We  thus  build  a  refinement  structure 
which  obeys  a  rule  allowing  no  neighbor  cells  with  level  difference  greater  than  1.  Examples  of  allowed  and 
prohibited  configurations  are  shown  in  Fig.2.  Although  this  is  the  only  rule  in  the  whole  refinement  process, 
it  determines  the  global  structure  of  the  resulting  refinement  hierarchy  assuring  that  on  a  level’s  boundaries 
there  are  no  sharp  resolution  gradients.  On  the  next  refinement  pass  each  of  the  newly  born  children  is 
checked  against  the  density  criterion  and  can  be  in  its  turn  subdivided  into  8  children  if  further  splitting  is 
needed.  The  process  stops  when  either  the  density  criterion  is  satisfied  everywhere  or  the  maximum  allowed 
refinement  level  is  reached. 

The  refinement  process  proceeds  level  by  level  starting  from  the  base  grid.  On  any  level  of  the  mesh 
hierarchy  the  process  can  be  split  into  two  major  parts.  First,  we  mark  up®  all  the  cells  which  need  to  be 
split  creating  a  refinement  map.  However,  the  map  constructed  this  way  tends  to  be  “noisy”.  We  smooth 
it  by  marking  additional  cells  so  that  any  cell  which  was  marked  originally  is  surrounded  by  a  buffer  of 
at  least  two  other  marked  cells.  We  construct  this  buffer  using  an  algorithm  which  includes  several  passes 
through  a  level,  each  one  marking  additional  cells.  During  the  first  pass  the  neighbors  of  cells  marked  in  the 
refinement  map  are  marked  for  splitting  also.  After  that,  two  passes  are  made  in  which  we  mark  for  splitting 
only  those  cells  which  have  at  least  two  neighbors  already  marked  for  refinement  (note,  that  we  mean  cells 
marked  during  all  previous  (not  the  current)  passes).  These  three  passes  create  a  one-cell  cubic  buffer  around 
each  of  the  cells  marked  in  the  original  refinement  map.  Each  additional  set  of  three  passes  similar  to  those 
described  above  will  build  one  more  cubic  layer  around  every  originally  marked  cell.  Therefore,  to  build  a 
two-cell  buffer  we  make  six  passes.  When  the  map  is  completed,  it  is  used  to  make  the  actual  splitting. 


^The  difference  between  a  doubly  linked  list  and  the  usual  linked  list  (used,  for  example,  in  the  P^M  codes)  is  that  in  the 
former  we  keep  not  only  a  pointer  to  the  next  clement  but  also  a  pointer  to  the  previous  element  in  the  list.  This  aUows  us  to 
insert  and  delete  list  entries  without  rebuilding  the  whole  list. 

cell  is  marked  for  splitting  when  the  local  density  exceeds  a  predefined  level-dependent  threshold. 
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The  refinement  procedure  described  above  can  be  used  either  to  construct  the  mesh  hierarchy  from 
scratch  or  to  modify  the  existing  meshes.  However,  in  the  course  of  a  simulation  the  structure  is  neither 
constructed  nor  destroyed.  Instead,  on  every  computational  cycle  we  modify  existing  meshes  to  account  for 
the  changes  in  particle  distribution.  Therefore,  we  need  to  make  not  only  refinements  but  also  derefinements 
(in  the  places  where  it  is  no  longer  necessary  to  keep  resolution  at  the  current  level)  which  is  done  in  the 
same  manner  by  constructing  a  derefinement  map  -  that  is,  map  of  cells  marked  for  joining.  If  the  joining 
violates  the  above  mentioned  neighbor  rule  nothing  is  done  and  the  cell  is  kept  split.  Therefore,  the  code 
modifies  the  existing  structure  dynamically  keeping  the  refinements  in  accord  with  the  ever  changing  density 
field.  The  modification  of  the  hierarchy  requires  much  less  CPU  time  than  rebuilding  it  because  only  a  small 
number  of  cells  needs  to  be  modified  at  any  given  time  step.  Fig.3  shows  an  example  of  the  refinement  mesh 
hierarchy  built  in  one  of  the  ACDM  cosmological  simulations  described  in  Section  4.4 


3.3.  Particles  within  the  mesh  hierarchy  and  density  assignment 

Particle  coordinates  are  not  sufficient  to  specify  the  particle-mesh  connection  because  cells  of  different 
levels  can  share  the  same  volumes.  We  need  to  know,  however,  which  particles  belong  to  a  given  cell.  This 
is  done  by  arranging  particles  in  doubly  linked  lists  so  that  every  cell  “knows”  its  head  linked  list  particle 
(the  head  is  nil  if  the  cell  is  empty)  and  thus  all  the  other  particles  in  this  linked  list.  If  a  particle  moves 
from  cell  to  cell,  it  is  deleted  from  the  linked  list  of  the  cell  it  leaves  and  is  added  to  the  new  celPs  linked 
list.  Only  leaves  are  allowed  to  own  particles.  Once  a  cell  is  split,  all  its  particles  are  divided  among  its 
children.  However,  we  solve  the  Poisson  equation  on  every  refinement  level  so  that  the  density  is  needed  in 
every  cell  regardless  of  whether  or  not  it  is  a  leaf.  On  each  level,  starting  from  the  finest  and  up  to  the  level 
zero,  the  density  is  assigned  using  the  standard  Cloud-In-Cell  (CIC)  technique  (Hockney  &  Eastwood  1981). 
Because  particles  belong  only  to  the  finest  cells  enclosing  them,  when  we  go  from  level  to  level  the  particles 
are  passed  from  children  to  their  parents.  This  is  done  only  for  the  density  assignment  and  the  linked  list  is 
not  changed.  The  particles,  therefore,  contribute  to  the  density  on  any  level  they  are  physically  located  on. 


3.4.  Poisson  solver 

The  fact  that  level  zero  of  the  mesh  hierarchy  is  a  cubic  regular  grid  of  fixed  resolution  allows  us  to 
use  the  FFT  method  to  solve  the  Poisson  equation  on  this  grid  (Hockney  &  Eastwood  1981).  The  FFT 
technique  naturally  supports  periodic  boundary  conditions  which  is  important  for  cosmological  simulations. 

The  Poisson  equation  on  the  refinement  meshes  is  defined  as  a  Dirichlet  boundary  problem  where 
boundary  values  are  obtained  by  interpolating  the  potential  from  the  parent  grid.  In  our  algorithm,  the 
boundaries  of  the  refinement  meshes  can  have  an  arbitrary  shape  which  narrows  the  range  of  PDE  solvers 
that  one  can  use.  To  solve  the  Poisson  equation  on  these  meshes,  we  have  chosen  the  relaxation  method 
(Hockney  &  Eastwood  1981;  Press  et  al.  1992).  It  is  relatively  fast  and  eflScient  in  dealing  with  complicated 
boundaries.  In  this  method  the  Poisson  equation 


vV  =  /> 


is  rewritten  in  the  form  of  a  diffusion  equation. 


dr 


=  vV-/», 


(1) 


(2) 
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or,  in  finite  difference  form: 


^i,j,k  + 


At 

A2 


(e^-- 

\nh=l 


6C 


(3) 


where  the  summation  is  performed  over  a  celPs  neighbors.  Here,  A  is  the  actual  spatial  resolution  of  the 
solution  (potential),  while  Ar  is  a  fictitious  time  step  (it  is  not  related  to  the  actual  time  integration  of 
the  iV-body  system).  This  finite  difference  method  is  stable  when  Ar  <  A^/6  (Press  et  al.  1992).  If  we 
choose  the  maximum  allowed  time  step  Ar  =  A^/6,  the  above  equation  can  be  rewritten  in  the  form  of  the 
following  iteration  formula: 


The  relaxation  iteration  is  thus  averaging  the  potential  of  a  celPs  six  neighbors  and  subtracting  the  contri¬ 
bution  from  the  source  term.  Cells  in  the  boundary  layer  will  have  some  neighbors  belonging  to  the  coarser 
level.  In  this  case,  we  need  to  interpolate  to  get  the  potential  at  the  location  of  the  expected  neighbor.  It 
is  desirable  that  the  interpolation  maintain  continuity  and  isotropy  of  the  force  (see  discussion  in  Jessop  et 
al.  1994).  We  have  found  that  linear  interpolation  perpendicular  to  the  boundary  which  incorporates  both 
coarser  and  finer  cell  potentials  is  satisfactory;  we  get  the  interpolated  value  of  the  potential  on  the  boundary 
of  level  I  as: 

-h  (1  —  (5) 

Here  Wi  is  a  weight,  and  and  are  the  potentials  of  a  boundary  cell  of  level  I  and  of  its  (/  —  l)-level 
neighbor.  We  have  found  the  optimal  value  of  Wi  to  be  0,2  by  minimizing  the  force  discontinuity  for  particles 
moving  through  mesh  boundaries.  The  iterative  procedure  described  above  is  then  repeated  until  the  desired 
level  of  convergence  is  achieved.  We  can  considerably  speed-up  the  convergence  of  the  relaxation  procedure 
by  using  an  initial  guess  for  the  solution  which  is  already  close  to  the  final  solution.  Such  an  initial  guess 
can  be  obtained  by  interpolating  the  potential  from  the  previous  coarser  mesh,  where  the  Poisson  equation 
was  already  solved.  By  doing  so,  we  need  only  2-3  iterations  to  get  the  potential  with  an  accuracy  of  a 
couple  per  cent.  Nevertheless,  a  higher  accuracy  is  needed  because  the  potential  is  then  differentiated  to  get 
the  accelerations  and  the  errors  in  accelerations  are  thus  larger  than  errors  in  the  potential.  Therefore,  we 
would  need  to  make  more  iterations  to  reach  the  same  ^  1  —  2%  accuracy  level.  The  number  of  required 
iterations,  however,  can  be  considerably  reduced  by  using  the  so-called  Successive  Overrelaxaiion  (SOR) 
technique  (Hockney  &  Eastwood  1981;  Press  et  al.  1992).  In  this  technique  the  solution  in  a  given  cell  is 
computed  as  a  weighted  average, 

=a;(^"+^H-(l-w)0”,  (6) 

where  is  the  solution  obtained  via  the  iteration  equation  (4),  (j)^  is  the  solution  from  previous  iteration 
step,  and  w  is  the  overrelaxaiion  parameter.  The  parameter  w  can  be  adjusted  to  minimize  the  number  of 
iterations  required  to  achieve  a  certain  accuracy  level.  Of  course,  there  is  no  point  in  using  more  iterations 
than  is  needed  to  make  the  iteration  error  smaller  than  truncation  error.  The  latter  can  be  estimated  by 
making  the  number  of  iterations  very  large  so  that  the  iteration  error  is  negligible. 

The  ultimate  goal  of  any  7\r-body  algorithm  is  to  get  an  accurate  approximation  to  the  pairwise  inter¬ 
particle  forces.  Therefore,  we  use  the  force  accuracy  (see  Section  4)  to  determine  the  required  number  of 
iterations.  We  do  this  by  adjusting  the  overrelaxation  parameter  u  to  minimize  the  number  of  iterations 
while  keeping  the  force  accuracy  at  the  level  of  truncation  errors.  We  have  found  empirically  that  only  10 
relaxation  iterations  is  needed  if  w  =  1.25. 
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3.5.  Particle  dynamics 


To  integrate  the  trajectories  of  the  dark  matter  particles  we  use  the  Newtonian  equations  of  motion 
in  an  expanding  cosmological  framework  (e.g.,  Peebles  1980).  These  equations  can  be  expressed  in  terms 
of  comoving  coordinates  x  related  to  the  proper  coordinates  as  r  =  a(i)x,  where  a{t)  =  (1  +  is  the 
expansion  factor: 


rr 

dt  ~ 


dx 

dt 


(7) 


where  p  is  the  momentum  of  a  particle  and  Vx4>  is  given  by  the  Poisson  equation  relating  the  potential  ^  to 
deviations  of  density  from  the  background: 


=  47rGa^(/j-/9).  (8) 

The  above  equations  are  integrated  numerically  using  dimensionless  variables 

X  =  a^ox,  t  =  r/Fo.  <i>  =  P  =  p(xo^o),  P  =  (9) 

ottO 

where  xq  is  the  length  of  a  zero  level  mesh  cell  and  /fo  is  the  Hubble  constant.  We  also  use  the  expansion 
factor  a  instead  of  the  time  t  so  that  equations  (7)”(8)  can  be  rewritten  as: 

=  /(JIm  )  >  o)  (10) 

vv-  = 

Here  Q,m  is  the  present  day  {z  =  0)  contribution  of  matter  to  the  total  density  of  the  universe  and  Q\  is  the 
corresponding  contribution  of  the  vacuum  energy  (measured  by  the  cosmological  constant).  The  function  / 
is  specific  to  a  given  cosmological  model.  The  general  form  of  this  function,  valid  for  open,  flat,  and  closed 
cosmologies,  is  (e.g.  Carrol  et  al.  1992): 

/(fijlf,  QA)<*)  =  — -  - -  ■  . . .  (11) 

y/l  +  Qm  (l/fl  -  1)  +  fiA(^^ 1) 


We  adopt  a  standard  second-order  leap-frog  integration  scheme  of  advancing  particles  to  the  next  time 
step.  For  a  step  n,  corresponding  to  time  step  a„  =  a,nxt  +  «Aa,  the  momenta  and  positions  of  particles  are 
updated  as  follows: 


Pn+^  Pn-i  ““ /(^Af  j  f^Aj  ^n)  Aa, 

(12) 

V  I 

x„+i  =  x„  + /(QM)f2A,an+j)  "f-'"  Aa. 

Here  indices  n,  n  +  1,  and  n  ±  -  refer  to  quantities  evaluated  at  a„,  a„+i,  and  a„  ±  Ao/2  respectively. 
Although  multiple  time  stepping  is  probably  very  efficient  in  terms  of  CPU  time,  in  the  current  version  of 
the  code  we  use  a  constant  time  step  for  all  particles.  We  plan  to  implement  individual  time  steps  for  different 
levels  in  the  future.  Particle  coordinates  and  velocities  are  updated  using  their  accelerations  obtained  via 
numerical  differentiation  of  the  potential  and  the  consequent  interpolation  to  the  particle  location  using 
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the  CIC  method  (Hockney  &  Eastwood  1981).  There  are,  however,  some  complications  because  particles 
can  move  through  the  level  boundaries.  The  resolution  gradients,  for  example,  may  induce  unwanted  force 
fluctuations  and  anisotropies  (Jessop  et  al.  1994;  Anninos,  Norman,  &  Clarke  1994).  In  addition,  momentum 
conservation,  achieved  by  exact  cancellation  of  numerical  terms  in  the  CIC  method,  is  no  longer  guaranteed. 
This  means  that  additional  care  must  be  taken  to  minimize  these  effects.  Usually,  this  is  done  by  introducing 
extra  buffer  regions  along  the  mesh  interfaces  so  that  force  interpolation  on  the  boundaries  is  avoided.  In 
our  code,  we  do  not  introduce  additional  buffer  cells  on  the  mesh  boundaries  because  the  meshes  are  already 
expanded  by  smoothing  (see  Section  3.2).  Therefore,  we  simply  prohibit  force  interpolation  that  uses  both 
coarse  and  fine  boundary  cells,  interpolating  instead  on  the  coarse  level.  In  this  way,  particles  are  driven  by 
the  coarse  force  until  they  move  suflBciently  far  into  the  finer  mesh.  The  same  is  true  for  particles  moving 
from  the  finer  to  coarser  mesh. 


3.6*  Parallelization 

The  success  of  a  numerical  algorithm  largely  depends  on  how  easily  it  can  be  parallelized  to  run  on  large 
multiprocessor  computers.  The  parallelization  strategies  are  problem  dependent  and  are  different  for  shared 
and  distributed  memory  machines.  Here  we  briefly  outline  the  parallelization  specifics  of  the  ART  code. 

Functionally,  the  code  can  be  decomposed  in  two  parts  -  the  PM  part  working  on  the  base  grid  and 
the  part  dealing  with  refinement  meshes.  The  PM  algorithm  was  successfully  parallelized  for  both  massively 
parallel  distributed  memory  (such  as  CM-5  (Ferrell  &  Bertschinger  1994;  Smith  1995)  or  SP-2  (Gross  1997)) 
and  shared  memory  machines  (Yepes  et  al.  1996).  To  be  ported  to  distributed  memory  systems  the  ART  code 
requires  more  elaborate  functional  and  data  organization  (Khokhlov  1997)  but  is  relatively  straightforward 
to  parallelize  for  shared  memory  architectures.  The  most  CPU  expensive  parts  of  the  code  Poisson  solver 
and  force  interpolation  -  are  effectively  parallel.  In  the  mesh  modification  routine  the  most  expensive  part 
-  the  construction  of  refinement  and  derefinement  maps  -  is  also  parallel.  Currently,  blocks  of  the  code  that 
require  considerable  parallelization  efforts  but  are  relatively  CPU  inexpensive  -  the  density  assignment  and 
cell  splitting/joining  during  the  mesh  modifications^  -  are  run  serially. 

The  present  version  of  the  code  was  parallelized  to  run  in  shared  memory  mode  on  the  HP-Convex  SPP- 
1200  Exemplar  -  a  multi-purpose  scalable  parallel  computer.  The  timing  and  benchmarking  was  performed 
for  one  of  the  simulations  described  in  Section  3.8. 


3,7.  Memory  requirements 

Memory  requirements  of  the  code  are  determined  by  the  number  of  dark  matter  particles  iVp,  number  of 
cells  in  zero  level  base  grid  iV®,  and  number  of  cells  on  refinement  levels  7V^.  In  the  current  implementation 
of  the  code  the  total  number  of  memory  storage  elements  N  used  by  the  code  is: 

nART  ^  ^  ^  (13) 

The  overhead  for  is  determined  by  the  pointers  that  are  used  to  support  the  tree  refinement  hierarchy 
(see  Section  3.2).  It  can  be  reduced  by  a  factor  of  ^  2  —  3,  if  some  of  the  pointers  will  be  shared  by  siblings 


^Although  the  cell  splitting  and  joining  procedure  is  serial,  it  takes  a  negligible  amount  of  CPU  time  because  we  need  to  do 
it  only  for  a  small  number  of  cells  on  each  time  step. 
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(the  eight  cells  which  have  the  same  parent).  The  overhead  can  be  reduced  even  further  (making  number 
of  storage  elements  per  refinement  cell  only  ~  2.5)  by  incorporating  more  elaborate  ideas  (Khokhlov  1997). 
The  estimated  memory  requirement  for  an  optimized  code  is: 

«  9iVp  +  +  4.51V^  (14) 

We  plan  to  implement  these  improvements  in  the  future  versions  of  the  code. 

The  can  be  compared  to  the  corresponding  number  of  storage  elements  in  a  PM  code: 

6iVp  +  J\r0;  (15) 

The  apparent  overhead  of  the  ART  code  compared  to  PM  is  the  price  for  fully  adaptive  and  flexible  mesh 
structure.  It  should  be  noted,  however,  that  to  increase  resolution  by  a  factor  of  2  in  a  PM  code  the  number 
of  cells  must  be  increased  by  a  factor  of  8  which  severely  limits  the  maximum  possible  dynamic  range 
(which  with  largest  currently  available  computers  is  1000).  In  the  ART  code,  the  resolution  is  improved  by 
increasing  which  changes  very  slowly  when  resolution  is  increased.  For  example,  to  increase  resolution 
in  the  highest  density  regions  by  a  factor  of  2  in  the  simulations  described  in  Section  5  (see  also  Table  1), 
the  total  number  of  cells  was  increased  only  by  ^  3%.  Note  also  that  the  dynamic  range  of  ^  4000  was 
achieved  with  only  5  x  10®  cells  while  a  PM  code  would  require  ^  6.4  x  10^®  cells  to  reach  the  same 
resolution.  For  comparison,  the  memory  requirements  of  the  publicly  available  versions  (kindly  provided  by 
J.Barnes  and  H.M.P.Couchman)  of  TREE  code  and  AP^M  code  are  llNp-\-18Ncens,  where  Neeiu 

is  the  number  of  tree  cells,  and  ^  «  13 Ap  (not  including  the  overhead  related  to  the  refinement 

meshes  in 


3.8.  Timing 

In  this  section  we  present  timings  of  the  current  version  of  the  code  and  compare  the  performance  with 
other  high-resolution  i\^-body  codes.  In  Fig.5a  we  show  the  performance  of  different  blocks  of  the  code 
with  respect  to  the  expansion  parameter  in  a  ACDM  simulation  (Qa  =  1  —  fio  =  0.7,  h  =  0.7,  as  =  1.0) 
of  an  L  =  I5h~^  Mpc  box  with  N  =  32^  particles  and  64^  base  grid  (more  details  are  given  in  Section 
4.4).  The  simulation  was  run  on  8  CPUs  of  the  NCSA  SP-1200  Exemplar  in  shared  memory  mode.  The 
CPU  overhead  for  running  code  in  parallel  is  about  50%  and  the  same  simulation  run  on  an  IBM  RS/6000 
workstation  performs  ^  2.5  times  faster  (in  terms  of  CPU  but  not  in  terms  of  wallclock  time!).  The  overhead 
is  mostly  due  to  the  unusually  large  penalty  for  cache  missing  events  which  results  if  memory  is  accessed 
randomly.  In  Table  1  we  present  timing  for  different  code  blocks  for  the  final  time  step  {z  =  0)  of  two 
similar  ACDM  simulations  with  64^  particles  (see  section  5.2)  and  with  different  resolutions.  The  base  grid 


Table  1.  Code  timings  for  ACDM  simulations  with  64^  particles  on  8  CPUs  of  the  SPP-1200  Exemplar. 


Routine 

Maximum 

Number 

Density 

FFT 

Relaxation 

Particle 

Mesh 

Total 

level 

of  mesh  cells 

assignment 

solver 

solver 

motion 

modifications 

CPUl  (sec) 

4 

4860976 

25.8 

26.2 

54.5 

34.8 

44.0 

227.3 

CPU2  (sec) 

6 

5019136 

30.6 

26.4 

82.7 

40.8 

55.7 

285.9 
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in  both  simulations  was  128^  and  numbers  of  the  maximum  allowed  refinement  levels  were  4  and  6.  As 
before,  the  simulations  were  run  on  8  CPUs  of  the  SPP-1200  Exemplar.  We  compare  the  timings  from  Table 
1  with  the  performance  of  the  AP^M  code  (Couchman  1991).  The  final  step  in  a  two  level  AP^M  simulation 
of  an  open  (fio  =  0.5,  Qa  =  Oj  =  0.63,  (Ts  =  1.2)  cosmology  (box  L  =  150h“^  Mpc,  128^  grid,  128^ 
particles,  and  smoothing  kernel  t]  =  0.1  cell  giving  a  dynamic  range  of  about  1000)  took  1316  CPU  seconds 
on  one  processor  of  IBM  SP-2  computer  (S.Borgani  1996,  private  communication).  This  number  is  roughly 
consistent  with  timings  presented  in  original  paper  of  Couchman  (1991)  if  we  account  for  the  difference  in 
MFLOPs  between  used  machines.  The  first  simulation  in  Table  1  is  comparable  in  spatial  resolution  to 
the  above  AP^M  simulation  but  we  have  to  account  for  the  different  number  of  particles.  Only  density 
assignment  and  particle  motion  directly  scale  with  number  of  particles.  We,  therefore,  multiply  the  CPU 
time  spent  by  these  routines  by  8  which  gives  a  total  time  for  the  final  step  650  CPU  seconds.  Note, 
however,  that  about  half  of  this  CPU  time  is  penalty  for  the  cache  missings  which  would  be  negligible  for 
a  serial  run  on  the  SP-2.  The  ART  code,  therefore,  is  about  3  times  faster  than  AP^M  code  of  comparable 
resolution.  Note  also  that  although  in  the  second  simulation  from  Table  1  the  resolution  was  increased  by  a 
factor  of  4,  the  CPU  time  does  not  change  significantly  because  only  a  relatively  small  number  of  additional 
cells  was  required  to  resolve  the  highest  density  regions. 


3.9.  Energy  conservation 


In  an  expanding  Universe,  energy  conservation  is  expressed  by  the  Irvine-Layzer>Dmitriev-ZePdovich 


equation: 


or 


Where 


|[a(r  +  t/)l 

a(r +£/)!“„  = 


1 

II 

(16) 

-  r  Tda. 

(17) 

Jao 

1 

(18) 

The  error  in  energy  conservation  at  a  time  a,-  is  then  measured  by  comparing  the  change  in  total  energy  with 
the  change  in  aU : 

UU  [oo 


In  Fig.  5b  we  show  the  energy  conservation  error  versus  expansion  parameter  a  for  two  ACDM  simulations 
with  32^  and  64^  particles.  In  both  simulations  the  time  step,  Aa,  was  chosen  so  that  all  the  particles  would 
move  not  more  than  a  fraction  (^  0.1  —  0.3)  of  the  mesh  cell  they  are  residing  in.  We  note  that  energy  is 
conserved  at  the  level  of  ^  2%  in  the  32^  particle  simulation  and  at  the  level  of  1%  in  the  64^  particle 
simulation.  The  maximum  error  of  5%  and  3%  for  the  two  simulations  occurred  when  the  first  two 
refinement  levels  were  opened  at  a  ^  0.18.  This  may  be  a  result  of  the  rather  fast  change  in  resolution  in 
the  regions  of  ongoing  nonlinear  collapse. 
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4.  Tests  of  the  code 

In  this  section  we  present  tests  of  the  developed  code.  Particularly,  we  show  results  describing  the 
accuracy  of  the  force  calculation  in  ART  scheme  and  related  issues  of  resolution.  We  discuss  the  results  of 
the  ZePdovich  pancake  test  and  of  the  spherical  infall  test.  Finally,  we  compare  results  for  a  set  of  realistic 
cosmological  runs  obtained  with  ART  and  PM  code. 


4.1.  Force  accuracy 

It  is  important  to  know  the  shape  and  accuracy  of  both  short-  and  long-range  forces,  when  the  resolution 
of  diiferent  codes  is  compared.  Here  we  present  a  test  showing  the  accuracy  of  force  calculations  in  PM  and 
ART  schemes  and  compare  them  to  the  Plummer  softened  force  which  is  often  used  in  both  P^M  and  TREE 
codes. 

We  used  a  64^  base  grid  with  a  massive  particle  in  the  center  and  a  second  particle  placed  randomly 
nearby.  The  refinement  meshes  were  constructed  up  to  the  specified  level,  so  that  both  particles  were  located 
on  this  level.  The  usual  potential  and  force  calculations  (described  above)  were  then  performed  to  get  the 
pairwise  force  between  these  two  particles  which  was  compared  with  the  “exact”  Newton  force.®  Fig.6  shows 
the  results  of  the  force  calculation  with  the  FFT  method  (i.e.,  ART  without  refinements)  and  with  the 
relaxation  method  at  different  levels  of  refinement.  We  plot  relative  acceleration  errors  calculated  as  follows: 

f;rror= 

where  |aca/c|  is  the  acceleration  calculated  on  the  mesh  and  [at/jcorl  is  the  theoretical  acceleration.  The  upper 
panel  in  Fig.5  shows  the  relative  force  error  versus  interparticle  separation,  given  in  the  units  of  the  base 
grid,  for  a  pure  PM  calculation  and  for  the  second  and  fourth  refinement  levels.  Note  that  despite  the  general 
similarity  of  the  shape  of  FFT  and  relaxation  forces,  the  scatter  of  the  latter  at  small  separations  is  larger. 
The  relaxation  force,  however,  is  about  twice  “softer”  (it  reaches  a  «  5%  error  level  only  at  separations  «  2 
grid  cells,  the  FFT  force  reaches  this  level  at  «  1  cell).  The  scatter  at  small  separations  does  not,  however, 
mean  that  we  have  the  same  errors  in  particle  orbits.  In  the  case  of  a  circular  orbit  with  a  radius  of  3  cells, 
for  example,  the  error  in  radius  is  less  than  1%  for  at  least  a  few  orbital  periods  in  both  methods. 

The  lower  panel  in  Fig.6  shows  the  relative  error  for  ART  force  calculated  on  the  fourth  level  versus 
distance  in  the  units  of  the  fourth  level  mesh  cell  together  with  the  relative  error  corresponding  to  the 
Plummer  softened  force  (solid  line)  with  the  softening  parameter  e  (^  oc  l/Vr^  +  e^)  equal  to  the  size  of  the 
fourth  level  cell.  Comparison  shows  that  while  ART  force  fluctuates  around  zero  for  distances  greater  than 
two  mesh  cells,  the  Plummer  softened  force  is  considerably  lower  than  1/r^  law  for  up  to  six  grid  cells.  The 
relation  between  resolution  of  the  ART  code,  hARTt  and  Plummer  softening  length  is  thus  c  «  Gelb 

(1992)  studied  the  shape  of  the  Plummer  softened  force  in  his  P®M  code.  The  result  is  consistent  with  ours 
(Fig.  2.4  in  Gelb  1992)  -  the  force  starts  to  fall  down  at  a  distance  about  five  times  larger  than  the  softening 
length. 


*  We  should  note  that  particles  in  the  CIC  scheme  (see  Section  2)  have  cubic  rather  than  spherical  shape  so  that  the  exact 
force  at  sm^dl  separations  is  not  an  inverse  square  function  of  distance. 
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4.2.  Zel’dovich’  pancake  collapse 

The  one-dimensional  plane  wave  collapse  in  an  expanding  universe  is  one  of  the  traditional  tests  of 
jV-body  codes  (Klypin  &  Shandarin  1983;  Efstathiou  et  al.  1985).  In  this  test,  the  analytical  solution 
(Zel’dovich  1970)  is  used  to  check  how  accurately  the  code  integrates  particle  trajectories.  If  we  know  the 
initial  conditions,  the  solution  predicts  particle  positions  for  any  other  moment  of  time: 

xf  =  qf  +  cos  (27rk  •  q)  , 

here  are  the  initial  (unperturbed)  positions,  k  is  the  wavevector,  and  A{t)  is  the  amplitude.  In  an  f2  =  1 
universe,  A  =  a(^)/ao  where  oq  is  scale  factor  at  the  crossing  time.  The  corresponding  velocities  can  be 
obtained  by  differentiating  the  above  formula  for  positions. 

We  used  a  32^  base  grid  with  64®  particles.  The  particle  positions  and  velocities  were  initially  perturbed 
using  the  Zel’dovich  approximation.  The  particle  trajectories  were  integrated  by  the  ART  code  with  three 
levels  of  refinement  (the  density  threshold  for  opening  the  next  level  was  twenty  particles  in  a  cell  on  the  base 
grid  and  three  particles  in  a  cell  for  any  refinement  level).  In  Fig.7  we  show  one-dimensional  phase  diagrams 
at  the  crossing  time  where  results  of  the  ART  code  are  compared  with  results  of  PM  code  (32®  grid).  The 
figure  shows  that  the  ART  code  follows  the  analytical  solution  more  accurately  than  the  PM  code.  This  is 
shown  quantitatively  in  Fig.8,  where  we  plot  rms  deviations  of  the  particle  positions  and  velocities  from  the 

exact  solution  as  a  function  of  time  (Efstathiou  et  al.  1985):  Ax  rms  •— 

Avrms  =  ^  *  Starting  from  the  moment  when  the  first  refinement  level  was 

created  (a  «  0.29)  the  rms  deviations  were  systematically  lower  in  the  ART  code  than  in  the  PM  code. 


4.3.  Spherical  infall  test 


An  analytical  solution  describing  spherical  infall  of  material  onto  an  overdensity  in  an  expanding  cos¬ 
mological  framework  was  developed  by  Fillmore  &  Goldreich  (1984)  and  Bertschinger  (1985).  As  noted  by 
Splinter  (1996),  the  problem  possesses  a  symmetry  different  from  intrinsic  planar  symmetry  of  the  mesh 
codes.  This  makes  it  a  useful  and  strong  test.  The  analytical  solution  describes  the  evolution  of  a  spherical 
uniform  overdensity  Sp/p  =  6,*  1  in  a  region  which  has  proper  radius  Ri  at  some  initial  time  U  in  a  flat 

Einstein-de  Sitter  universe.  As  the  density  contrast  grows,  the  matter  initially  inside  Ri  is  increasingly  decel¬ 
erated.  Eventually  it  stops  expanding  and  turns  around  to  collapse.  If  the  initial  Hubble  flow  is  unperturbed 
(all  peculiar  velocities  are  initially  equal  to  zero),  the  initial  turnaround  occurs  at  the  time  tua- 


i„. = 


At  this  moment  the  initial  overdensity  reaches  its  maximum  radius: 


Hta  —  Ri^i  * 

Later  on,  shells  of  successively  larger  and  larger  radii  turn  around.  At  a  given  time  t  >  Utay  the  shell  of  a 

r,.(l)  =  r„.  j 
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starts  to  collapse  (Bertschinger  1985). 

The  solution  for  the  density  profile  of  the  overdensity  is  self-similar  in  terms  of  the  dimensionless  variable 
A  =  r/viait)  and  for  A  <  1  can  be  expressed  as  (Fillmore  k  Goldreich  1984;  Bertschinger  1985): 

D{\)  «  2.79A^^/^ 

while  at  larger  A  the  main  feature  of  the  solution  is  the  presence  of  sharp  caustics. 

We  modelled  the  spherical  infall  problem  described  above  by  distributing  32^  particles  uniformly  on 
the  32^  grid  (in  the  centers  of  the  grid  cells)  and  placing  additional  particles  in  a  sphere  of  2  grid  cell 
radius  in  the  center  of  the  computational  volume.  The  number  of  additional  particles  determines  the  initial 
overdensity  that  we  have  chosen  to  be  6,-  =  0.2.  We  integrated  particle  trajectories  from  a,-  =  =  0.1  up 

to  a  =  10.1  making  10000  steps  to  insure  that  the  Courant  condition  (i.e.,  particle  moves  only  a  fraction  of 
a  mesh  cell  in  a  single  time  step)  is  not  violated  on  any  of  the  five  refinement  levels  the  condition  required 
for  the  integration  to  be  stable.  The  refinement  levels  were  introduced  in  the  regions  where  density  was 
equivalent  to  more  than  six  particles  in  a  cell  and  the  number  of  levels  was  limited  to  five  making  thus  the 
effective  resolution  in  the  highest  density  regions  equivalent  to  1024.  In  Fig.9  the  calculated  density  profile  is 
compared  with  the  analytical  solution  (from  Tables  4  and  5  in  Bertschinger  1985).  We  see  a  good  agreement 
between  the  calculated  density  profile  and  the  analytical  solution  at  all  radii  down  to  the  resolution  limit. 


4.4.  Realistic  cosmological  runs:  comparison  with  PM  code 

To  test  the  performance  of  the  code  in  realistic  cosmological  simulations  we  made  a  set  of  runs  using 
ART  and  PM  codes  with  the  same  initial  conditions  and  different  spatial  resolutions  and  compared  the 
resulting  particle  distributions.  In  the  first  four  of  runs  we  simulated  an  L  =  20h''^  Mpc  box  with  N  =  32^ 
particles  assuming  a  flat  ACDM  cosmological  model  (Qa  =  0.7,  h  =  0.7,  erg  =  1.0).  In  ART  runs  we  allowed 
for  two  levels  of  refinement  starting  from  64^  and  128^  grids  -  we  will  call  these  runs  ART  64^  +  2L  and  ART 
128^  +  2L.  The  number  of  time  steps  was  1000  in  the  ART  64^  -f  2L  and  ^  2000  in  the  ART  128^  -h  2L 
run.  The  refinement  levels  were  introduced  wherever  the  density  exceeded  a  threshold  value  equivalent  to 
more  than  5  particles  per  cell.  The  PM  code  was  run  with  64®  (PM  64®)  and  256®  (PM  256®)  grids.  In 
Fig.  10  we  show  the  projected  particle  distribution  at  2:  =  0  from  these  four  runs.  The  global  distribution 
of  particles  and  halos  is  well  reproduced  by  the  ART  code.  Also,  halos  in  the  ART  64®  +  2L  simulation 
are  much  more  compact  than  in  the  PM  64®  simulation.  This  is  shown  quantitatively  in  Fig.ll,  where  we 
compare  density  distribution  functions  (the  fraction  of  the  total  mass  in  the  regions  of  a  given  overdensity) 
in  these  simulations.  The  density,  distributions  for  all  runs  were  computed  after  rebinning  the  density  field  to 
the  256®  grid.  The  resolution  of  a  simulation  puts  limits  on  the  maximum  density  in  the  halo  cores  because 
gravitational  collapse  virtually  stops  at  scales  of  1  grid  cell  (e.g.,  Klypin  1996).  Therefore,  the  density 
in  the  halo  cores  (the  high-density  tail  of  the  distribution)  is  a  good  indicator  of  the  spatial  resolution.  We 
note  that  the  density  distribution  functions  for  both  ART  64®  +  2L  and  PM  256®  runs  show  approximately 
the  same  behavior  reaching  overdensities  of  «  2  x  10^,  while  the  PM  64®  run  fails  to  produce  halos  with 
overdensities  greater  than  5  x  10®.  We,  therefore,  conclude  that  the  ART  code  produces  density  fields 
similar  to  those  of  the  PM  code  of  comparable  resolution. 

The  first  application  of  the  code  was  study  of  the  structure  of  dark  matter  halos.  Therefore,  as  a  final 
test  we  compared  the  halo  density  profiles  in  the  ART  and  PM  simulations.  The  size  of  the  simulation 
box,  L  =  Mpc,  was  chosen  to  be  the  same  as  in  the  larger  simulations  described  in  the  next  section. 
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The  rest  of  the  parameters  were  the  same  as  in  the  above  simulations.  We  simulated  the  evolution  of  the 
32^  particles  using  the  PM  code  with  256^  mesh  and  the  ART  code  with  a  64^  base  grid  and  three  levels 
of  refinement.  As  before,  we  refined  regions  where  the  local  density  exceeded  a  threshold  value  of  about 
5  particles  per  cell.  A  halo  finding  algorithm  (described  in  the  next  section)  was  applied  to  the  resulting 
particle  distribution.  In  Fig.  12  we  present  the  density  profiles  for  six  halos  of  different  masses.  The  mass 
resolution  in  these  simulations  (1.2  x  IO^^Mq)  determined  the  mass  range  of  halos.  The  most  massive  halo 
in  Fig.  11  consists  of  about  5000  particles  while  the  least  massive  contains  only  about  100  particles.  The 
density  profiles  of  PM  and  ART  halos  agree  reasonably  well  down  to  the  resolution  limit  (^  60/i~^  kpc). 


5.  An  application:  structure  of  dark  matter  halos  in  CDM  and  ACDM  models 

5.1.  Motivation 

We  used  the  code  to  study  the  structure  of  dark  matter  halos  in  two  of  the  currently  popular  cosmological 
models  -  Standard  Cold  Dark  Matter  (SCDM)  and  Cold  Dark  Matter  with  cosmological  constant  (ACDM). 
The  dark  matter  halos  play  a  crucial  role  in  the  formation  and  dynamics  of  galaxies  and  galaxy  clusters. 
Therefore,  theoretical  predictions  about  the  structural  and  dynamic  properties  of  the  halos  can  be  compared 
with  observations  and  used  as  a  powerful  test  of  a  given  theoretical  model.  The  numerical  study  of  the  halo 
structure  requires  very  high  spatial  dynamic  range  (at  least  ^  10^)  because  the  simulation  box  has  to  be 
large  enough  to  account  correctly  for  large  perturbation  waves  and  the  force  resolution  has  to  be  high  enough 
to  make  predictions  in  the  observational  range  (<  5  kpc).  The  ART  code  was  designed  to  handle  such  high 
dynamic  ranges. 

The  properties  of  dark  matter  halos  were  intensively  investigated  recently  for  a  variety  of  cosmological 
models.  Early  numerical  studies  (Frenk  et  al,  1985;  Quinn,  Salmon,  &  Zurek  1986;  Efstathiou  et  al. 
1988;  Frenk  et  al.  1988)  indicated  that  the  density  profiles  of  dark  matter  halos  in  hierarchical  clustering 
models  in  a  flat,  0  =  1,  universe  were  approximately  isothermal  (p(r)  oc  r"^)  in  agreement  with  analytical 
results  (Fillmore  &  Goldreich  1984;  Bertschinger  1985).  The  dependence  of  the  halo  density  profiles  on  the 
initial  perturbation  spectrum  and  on  specific  parameters  of  the  cosmological  model  were  also  studied  both 
analytically  (Hoffman  &  Shaham  1985;  Hoffman  1988)  and  numerically  (e.g.,  Crone,  Evrard,  &  Richstone 
1994).  These  early  numerical  studies,  however,  lacked  the  necessary  mass  and  spatial  resolution  to  make 
reliable  predictions  on  the  structure  of  the  halo  cores.  To  overcome  the  resolution  limits,  substantial  efforts 
were  made  to  simulate  the  formation  of  halos  from  isolated  density  perturbations  (e.g.  Dubinski  &  Carlberg 
1991;  Katz  1991)  or  to  resimulate  with  a  higher  resolution  halos  identified  in  large  low-resolution  runs 
(Navarro,  Frenk,  k  White  1996;  Tormen,  Bouchet,  &  White  1996).  These  simulations  have  the  advantage  of 
simulating  halos  in  a  wide  mass  range  with  homogeneous  spatial  and  mass  resolutions.  However,  the  tidal 
effects  from  the  neighboring  galaxies  and  effects  of  halo  mergers  are  simulated  rather  roughly.  It  is  therefore 
important  to  check  the  results  of  such  simulations  with  the  results  of  direct  simulation  of  halo  formation  in 
a  representative  box.  Studies  of  the  structure  and  dynamics  of  halos  extracted  from  high-resolution  direct 
simulations  were  done  by  Warren  et  al.  (1992)  and  Cole  &  Lacey  (1996).  Warren  et  al.  (1992)  used  TREE 
code  to  simulate  the  evolution  of  128^  particles  in  an  Q  =  1  universe  with  scale-free  initial  conditions.  The 
force  resolution  was  determined  by  imposing  a  Plummer  softening  of  e  =  5  kpc.  A  special  emphasis  was  made 
on  the  investigation  of  halo  shapes.  Similar  initial  conditions  were  used  in  the  study  by  Cole  &  Lacey  (1996), 
who  used  a  P^M  code  to  evolve  128^  particles.  The  resolution  in  the  latter  simulations  was  L/e  =  3840, 
where  L  is  the  size  of  the  computational  volume  and  e  is  the  Plummer  softening  parameter.  The  results 
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indicated  that  the  density  profiles  of  all  simulated  halos  are  well  fit  by  the  analytical  model  of  Navarro, 
Frenk,  &  White  (1996)  (hereafter  NFW).  This  model  has  p  oc  at  small  radii  and  steepens  smoothly  to 
p  (X  at  a  scale  radius  r^: 


p{r)  oc 


1 

r(H-  r/vsY' 


(20) 


The  density  profile  described  by  this  expression  is  singular,  because  density  rises  arbitrarily  high  when  r  — >  0 
forming  a  cusp.  The  cuspy  structure  of  the  central  parts  of  a  halo  represents  thus  a  generic  prediction  of  the 
model.  Although  the  NFW  profile  is  consistent  with  current  X-ray  and  gravitational  lensing  observations 
of  galaxy  clusters  (NFW),  the  p  a  behavior  is  in  contradiction  with  observations  of  dynamics  of  dwarf 
spiral  galaxies  that  imply  flat  central  density  profiles  (Flores  k  Primack  1994;  Moore  1994;  Burkert  1996). 
These  observations  can  serve  as  one  of  the  critical  tests  of  any  model  that  includes  a  dark  matter  component 
because  it  is  generally  believed  that  the  dynamics  of  the  dwarf  spiral  galaxies  is  dominated  by  dark  matter 
on  scales  r  >  1  kpc.  The  fact  that  the  NFW  profile  holds  for  a  variety  of  cosmological  models  (NFW;  Cole 
k  Lacey  1996)  indicates  its  possible  universality  for  CDM-like  models.  The  goal  of  the  present  study  was 
to  investigate  the  structure  of  dark  matter  halos  formed  in  a  ACDM  model.  This  model  is  currently  one 
of  the  most  successful  scenarios  of  structure  formation  in  the  Universe.  It  is,  therefore,  important  to  check 
whether  the  central  cusp  is  present  in  halos  formed  in  this  model. 


4 


5.2.  Simulations 

To  study  the  structure  of  dark  matter  halos  we  simulated  the  evolution  of  64®  particles  in  standard  CDM 
(Q  =  1,  h  =  0.5,  (Tg  =  0.63)  and  ACDM  (Q  =  0.3,  Da  =  0.7,  erg  =  1.0)  models.  We  made  three  runs  -  one 
high-resolution  (resolution  2h~^  kpc)  run  for  each  models,  and  a  lower  resolution  run  (resolution 
kpc)  for  the  ACDM  model  to  study  the  effects  of  resolution.  In  terms  of  the  Plummer  softening  length  (see 
Section  4.1),  our  resolution  corresponds  to  L/e  12000  for  the  high  resolution  runs,  and  L/e  w  3000  for 
the  low  resolution  run.  The  simulations  were  started  at  r  =  30  and  particles  trajectories  were  integrated  by 
making  3872  time  steps  in  the  low  resolution  run,  and  7743  time  steps  in  high-resolution  runs.  The  size  of 
the  simulation  box,  L  =  15h“^  Mpe,  determines  the  mass  resolution  (particle  mass)  -  1.523  x  lO^h'^Mg 
for  SCDM  and  5.077  x  lO^ft-^MQ  for  ACDM. 


5.3.  Halo  finding  algorithm 

To  identify  halos  in  our  simulations  we  use  an  algorithm  similar  to  that  described  in  Klypin,  Primack,  & 
Holtzman  (1996).  The  algorithm  identifies  halos  eis  local  maximaof  meiss  inside  a  given  radius.  The  efficiency 
of  the  algorithm  was  improved  by  incorporating  the  idea  of  Warren  et  al.  (1992)  of  finding  approximate 
locations  of  density  peaks  using  particle  accelerations.  This  idea  is  based  on  the  principle  that  particles  with 
the  largest  accelerations  should  reside  near  the  halo  centers  which  is  true  for  halos  with  roughly  isothermal 
density  profiles.  In  practice,  this  way  of  finding  density  maxima  has  proved  to  be  quite  efficient.  The  halo 
identification  algorithm  can  be  described  as  follows. 

1.  The  particles  are  sorted  according  to  the  magnitude  of  their  scalar  accelerations. 

2.  The  particle  with  the  largest  acceleration  determines  the  approximate  location  of  the  first  halo  center. 
Particles  located  inside  a  sphere  of  radius  rjnj*  centered  at  the  halo  center  are  assigned  to  the  same  halo  and 
are  excluded  from  the  list  of  particles  used  to  identify  halos.  The  radius  is  an  adjustable  parameter; 
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we  use  a  radius  approximately  twice  larger  than  the  force  resolution  of  a  simulation.  The  procedure  repeats 
for  the  particle  with  the  largest  acceleration  in  the  list  of  remaining  particles.  The  peaks  are  identified  until 
there  are  no  particles  in  the  list. 

3.  When  all  the  density  peaks  are  identified,  we  proceed  to  find  more  accurate  positions  of  the  halo 
centers.  This  is  done  iteratively  by  finding  the  center  of  mass  of  all  particles  inside  Vinu  and  displacing  the 
center  of  the  sphere  to  the  center  of  mass.  The  procedure  is  iterated  until  convergence. 

4.  When  the  halo  centers  are  found,  we  increase  Vinit  until  the  overdensity  inside  the  corresponding 
sphere  reaches  a  certain  limit.  The  limit  is  based  on  the  top-hat  model  of  gravitational  collapse  that  predicts 
the  typical  overdensity  for  virialized  objects  200  in  CDM  and  334  for  our  ACDM  model  (e.g.  Lahav 
et  al.  1991;  Kitayama  &  Suto  1996).  However,  we  denote  halo  radius  and  mass  inside  this  radius  defined 
as  M200  and  r2oo  regardless  of  the  actual  value  of  the  limit.  Smaller  halos  located  within  a  radius  r2oo  of  a 
bigger  halo  are  deleted  from  the  list. 

As  an  output  we  get  a  list  of  halo  positions,  velocities,  and  parameters  (such  as  r2oo  and  Af2oo)* 


5.4.  Results:  halo  density  profiles 

We  applied  the  halo  finding  algorithm  described  above  to  identify  halos  in  the  simulations  at  zero 
redshift.  Only  halos  with  more  than  100  particles  within  r2oo  were  taken  from  the  full  list.  We  also 
present  results  for  relatively  isolated  halos,  excluding  all  halos  which  have  close  (r  <  2r2oo)  neighbors  of 
mass  more  than  half  of  the  halo  mass.  The  density  profiles  were  constructed  by  estimating  the  density  in 
concentric  spherical  shells  of  logarithmically  increasing  thickness  with  the  smallest  radius  corresponding  to 
the  maximum  resolution.  The  resulting  profiles  were  fit  (the  fit  parameter  was  the  scale  radius  r^)  with 
the  analytical  formula  of  NFW  (eq.  20).  In  Fig. 13  and  14  we  show  density  profiles  of  9  halos  of  different 
mass  identified  in  the  high-resolution  CDM  and  ACDM  simulations  along  with  the  analytical  fits.  The  NFW 
profile  appears  to  be  a  good  approximation  for  halos  of  all  masses  (within  the  mass  range  of  our  simulations) 
in  both  CDM  and  ACDM  models. 

NFW  argued  that  the  concentration  parameter,  c  =  r^oo/^^  (see  eq.20),  of  a  dark  matter  halo  depends 
on  the  halo  mass.  They  have  found  that  low  mass  halos  are  more  centrally  concentrated  than  high  mass 
ones  which  possibly  reflects  different  formation  redshifts  of  halos.  Fig.l5  shows  the  concentration  c  as  a 
function  of  halo  mass  (M200)  for  halos  identified  in  our  simulations.  The  solid  curve  represents  a  theoretical 
prediction  (see  NFW  for  discussion)  assuming  a  definition  for  the  formation  time  of  a  halo  as  the  first  time 
when  half  of  its  final  mass  M200  was  in  progenitors  with  individual  masses  exceeding  fraction  /  =  0.01  of 
M200*  This  particular  value  of  /  seemed  to  provide  the  best  approximation  to  the  numerical  results  of  NFW 
for  the  CDM  model.  The  results  of  both  the  CDM  and  the  ACDM  simulations  agree  reasonably  well  with 
this  curve.  The  larger  spread  of  parameter  c  for  low  mass  halos  is  due  largely  to  the  statistical  noise.  The 
lowest  mass  halos  (M200  10^^  ^  M©)  contain  a  few  hundred  particles  within  their  r2oo  and  thus  have  more 

noisy  density  profiles  (typical  2(t  error  in  logc  ^  0.2  —  0.3)  compared  to  more  massive  halos  (M200  >  10^^ 
Mq)  which  have  tens  of  thousands  of  particles  (error  in  logc  0.05). 
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5.5.  Effects  of  resolution 

It  is  important  to  model  reliably  the  structure  of  the  very  central  parts  (r  <  10/i“^  kpc)  of  a  halo 
because  that  is  where  we  can  compare  model  predictions  with  observational  results.  Unfortunately,  that  is 
also  where  force  resolution  may  strongly  affect  the  shape  of  the  density  profiles.  To  study  possible  effects  of 
resolution,  we  have  compared  density  profiles  of  the  same  halos  taken  from  ACDM  simulations  of  different 
resolution  described  in  Section  5.2.  In  Fig.  16  we  compare  density  profiles  of  4  halos  from  these  simulations. 
As  before,  the  density  profile  is  drawn  only  to  the  resolution  limit  of  simulation.  We  conclude  that,  up  to 
the  resolution  limit,  the  lower  resolution  density  profile  follows  that  of  the  higher  resolution  simulation. 


5.6.  Conclusions 

We  studied  the  structure  of  dark  matter  halos  in  CDM  and  ACDM  models  with  a  resolution  of 
kpc  in  a  box  of  15/i”^  Mpc. 

1.  We  found  that  for  r  <  r20O)  the  density  profiles  of  a// halos  in  both  CDM  and  ACDM  simulations  are 
well  fit  by  the  analytical  formula  (eq.  20)  of  Navarro  et  al.  (1996). 

2.  The  mass  dependence  of  the  halo  concentration  parameter  c  in  our  simulations  is  consistent  with  the 
results  of  Navarro  et  al.  (1996). 

3.  The  fact  that  our  results  for  the  CDM  model  agree  with  the  results  of  Navarro  et  al.  (1996)  serves 
both  as  a  final  test  of  the  presented  code  and  as  an  independent  check  of  their  method  with  results  from 
direct  cosmological  simulations. 


6.  Discussion  and  conclusions 

We  present  a  new  high-resolution  AT-body  code  which  incorporates  the  idea  of  Adaptive  Refinement 
Tree  (Khokhlov  1997)  to  build  a  hierarchy  of  refinement  meshes  in  regions  where  higher  resolution  is  desired. 
Unlike  other  AT-body  codes  that  make  use  of  refinement  meshes,  our  code  is  able  to  construct  meshes  of 
arbitrary  shape  covering  equally  well  both  elongated  structures  (such  as  filaments  and  walls)  and  roughly 
spherical  dark  matter  halos.  The  meshes  are  modified  to  adjust  to  the  evolving  particle  distribution  instead 
of  being  rebuilt  at  every  time  step.  We  use  a  cubic  grid  as  the  zeroth  level  of  the  mesh  hierarchy.  The  size 
of  this  grid  determines  the  minimum  possible  resolution  of  a  simulation  (i.e,,  resolution  in  regions  where 
there  are  no  refinements).  The  code  blocks  working  on  the  zero  level  grid  are  similar  to  those  of  a  PM  code. 
To  solve  the  Poisson  equation  on  refinement  meshes  we  have  developed  a  new  solver  that  uses  a  multilevel 
relaxation  method  with  Successive  Overrelaxation  (Hockney  &  Eastwood  1981;  Press  et  al.  1992).  The 
solver  is  fully  parallel  and  is  only  2  times  slower  than  an  FFT  solver  for  the  same  number  of  mesh  cells.  In 
real  simulations  with  the  same  resolution,  the  relaxation  solver  outperforms  the  FFT  because  the  resolution 
is  achieved  with  much  smaller  number  of  cells  (see  Section  3.7)  The  presented  tests  (Section  4)  show  that  our 
code  adequately  computes  gravitational  forces  down  to  scales  of  ^  1.5-2  mesh  cells.  The  memory  overhead 
in  the  current  version  of  the  code  is  rather  large  compared  to  other  high-resolution  codes.  The  number  of 
required  mesh  cells,  however,  changes  very  slowly  with  increasing  resolution.  At  present,  the  code  is  capable 
of  handling  a  dynamic  range  as  high  as  10000.  Also,  tests  of  the  code  performance  show  that  it  is  about 
three  times  faster  than  an  AP^M  code  (and  thus  TREE  code,  see  Couchman  1991)  of  comparable  resolution. 
Still,  the  Courant  condition  requiring  that  particles  move  only  a  certain  fraction  of  a  mesh  cell  at  every  time 
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step  makes  the  code  CPU  rather  than  memory  limited. 

The  present  version  of  the  code  is  by  no  means  optimal  and  we  plan  the  following  improvements. 

1.  The  memory  requirement  of  the  code  can  be  significantly  reduced  if  pointers  that  are  used  to  support 
the  tree  refinement  structure  are  shared  by  siblings  (descendants  of  the  same  parent  cell).  The  memory 
overhead  can  be  reduced  even  further  by  incorporating  more  elaborate  algorithms  of  data  storage  (Khokhlov 
1997). 

2.  The  current  version  of  the  code  (as  well  as  any  other  high-resolution  code)  is  constrained  by  the 
Courant  condition.  To  insure  that  this  condition  is  satisfied  on  the  maximum  refinement  level  requires  small 
time  steps  redundant  for  particles  moving  on  coarser  meshes.  We  plan  to  incorporate  an  integration  scheme 
with  multiple  time  steps. 

3.  We  plan  to  integrate  the  present  iV-body  code  with  a  high-resolution  Eulerian  hydrodynamics  code 
(Khokhlov  1997)  which  works  on  similar  refinement  meshes. 

We  have  used  the  ART  code  to  study  the  structure  of  dark  matter  halos  in  two  cosmological  models  - 
standard  CDM  (Q  =  1,  =  0.5,  (Tg  =  0.63)  and  a  variant  of  ACDM  (fl  =  0.3,  Qa  =  0.7,  erg  =  1.0).  We  have 

found  that  halos  formed  in  ACDM  model  have  density  profiles  similar  to  halos  formed  in  CDM  model.  The 
density  profiles  are  well  described  by  the  analytical  formula  (eq.  20)  presented  by  Navarro  et  al.  (1996)  and 
have  cuspy  (p(r)  a  r^^)  structure  in  the  central  (r  <  lO/i"^  kpc)  parts  of  a  halo  with  no  indications  of  a 
core  down  to  the  resolution  limit  of  our  simulations.  We  therefore  conclude  that  halos  formed  in  the  ACDM 
model  have  structure  similar  to  the  CDM  halos,  and  thus  cannot  explain  dynamics  of  the  central  parts  of 
dwarf  spiral  galaxies  inferred  from  their  rotation  curves. 
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Chris  Loken  for  the  help  in  improving  the  presentation  of  the  paper.  This  project  was  supported  in  part 
by  the  grant  AST9319970  from  the  National  Science  Foundation  and  by  the  Office  of  Naval  Research.  The 
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Fig.  1. —  Schematic  illustration  of  the  tree  structure  and  pointers  used  to  support  it  in  one  dimension. 
Scheme  (a)  shows  two  neighbor  cells  (Parent  and  Neighbor)  one  of  which  (Parent)  is  split  (it  has  two 
children  denoted  as  Childl  and  Child  2)  and  the  other  (Neighbor)  is  a  leaf.  The  arrows  denote  pointers  used 
to  support  the  structure  (see  text  for  details).  Arrows  drawn  with  broken  lines  denote  pointers  which  can 
be  shared  by  all  children.  Below  (b)  we  show  the  actual  locations  of  the  mesh  cells  in  space. 
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fig.  2 


Fig.  2. —  Examples  of  permitted  and  prohibited  neighbor  configurations.  The  prohibited  configurations  (the 
cells  which  violate  the  neighbor  rule)  are  indicated  with  arrows.  Note  that  it  is  not  allowed  to  have  neighbors 
with  level  difference  greater  than  1  but  it  is  allowed  to  have  a  “corner  neighbor”  with  level  difference  2. 
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Fig.  3.—  (a)  A  slice  through  the  refinement  structure  (base  grid  is  not  shown)  in  one  of  the  ACDM  simulations 
with  32®  particles  (see  section  4.4)  and  (b)  corresponding  slice  through  the  particle  distribution.  The  square 
in  (a)  shows  the  area  which  is  enlarged  in  Fig. 4 
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Fig.  4. —  Fragment  of  the  slice  from  Fig.  3a  denoted  by  the  square.  Note  that  the  mesh  generator  tends 
to  build  almost  rectangular  meshes  around  dense  isolated  clumps  of  particles  while  to  trace  a  filament 
the  generator  creates  meshes  of  arbitrary  shape  that  effectively  cover  the  elongated  structures  in  particle 
distribution. 
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Fig.  5. —  a)  Timing  for  ACDM  run  with  32®  particles  on  8  processors  of  SPP-1200  Exemplar  (see  section 
3.8  for  discussion).  The  total  CPU  time  per  step  was  scaled  down  by  a  factor  of  2  for  convenience,  b) 
Energy  conservation  error  versus  expansion  factor  a  in  ACDM  runs  with  32®  (solid  line)  and  64®  (broken 
line)  particles. 
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Fig.  6. —  a)  Pairwise  force  accuracy  of  the  ART  code  on  the  base  regular  grid  (FFT  solver)  and  on  the 
second  and  fourth  refinement  levels  (relaxation  solver)  versus  interparticle  separation,  b)  Comparison  of 
the  force  accuracy  on  the  fourth  refinement  level  with  the  theoretical  accuracy  of  a  Plummer  softened  force 
versus  interparticle  separation  in  units  of  the  fourth  level  cells. 
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Fig.  7. —  Plane  wave  collapse  test:  phase  diagram  in  Lagrangian  coordinates  q  with  3  refinement  levels  (a) 
and  in  PM  simulation  with  32^  grid  (b)  at  the  crossing  time.  Solid  line  represents  the  analytical  solution 
while  polygons  show  numerical  results.  Number  of  polygon  vertices  is  equal  to  the  level  plus  three  so  that 
triangles  show  particles  located  on  the  base  grid,  squares  show  particles  located  on  the  first  refinement 
level,  and  so  on.  The  corresponding  phase  diagram  for  physical  coordinates  x  are  shown  in  panels  c  and  d. 
The  lagrangian  coordinates  show  the  differences  between  ART  and  PM  results  more  clearly,  because  at  the 
crossing  time  the  i;  —  a:  phase  diagram  is  almost  vertical  in  the  central  part  of  the  pancake. 
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Fig,  8, —  Plane  wave  collapse  test:  rms  deviations  of  coordinates  (a)  and  velocities  (b)  from  analytical 
solution  versus  expansion  parameter  for  PM  code  (solid  line)  and  for  ART  code  with  three  levels  of  refinement 
(broken  line).  Starting  from  the  moment  when  the  first  refinement  level  was  introduced  (a  ^  0.29)  the  rms 
deviations  in  ART  code  are  lower  than  in  PM  code 
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Fig.  9. —  Spherical  infall  test:  simulated  density  profile  (filled  circles)  is  compared  with  the  analytical 
solution  (solid  line).  The  points  for  the  analytical  solution  are  taken  from  Tables  4  and  5  in  Bertschinger 
(1985).  The  simulated  density  profile  was  constructed  by  estimating  the  density  in  concentric  spherical  shells 
of  logarithmically  increasing  thickness  with  the  smallest  radius  corresponding  to  the  maximum  resolution. 
The  error  bars  correspond  to  the  poisson  noise. 


log  p/<p>  log  p/<p>  log  p/<p> 


log  r  (h’^  kpc) 


log  r  (h“^  kpc) 


Fig.  12. —  Comparison  of  density  profiles  for  halos  identified  in  PM  (dashed  lines)  and  ART  (solid  lines) 
simulations  (32^  particles)  of  comparable  resolution  (^  60/i“^  kpc)  started  from  the  same  initial  conditions. 
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Fig.  13. —  Density  profiles  (solid  lines)  for  9  halos  of  different  masses  taken  from  CDM  simulation  with  64^ 
particles  (resolution  2h~^  kpc)  and  the  best  fit  of  the  analytical  profile  (dotted  lines)  of  Navarro  et  al. 
(1996).  The  numbers  indicate  mass  of  a  halo  inside  a  radius  corresponding  to  over  density  200. 
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Fig.  14. —  The  same  as  Fig.  13  for  9  halos  of  extracted  from  ACDM  simulation  (64®  particles,  resolution 
^  kpc).  The  numbers  correspond  to  a  halo  mass  inside  a  radius  of  overdensity  334  (see  Section  5.4). 
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Fig.  15. —  Logarithm  of  the  concentration  parameter,  c  =  r2oo/^5,  versus  logarithm  of  halo  mass  M200  for 
high-resolution  kpc)  CDM  (filled  circles)  and  ACDM  simulations  (empty  circles).  The  solid  curve 

shows  the  mass-concentration  relation  predicted  from  the  formation  times  of  halos  which  best  described  the  ^ 

numerical  results  of  Navarro  et  al.  (1996). 
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Fig.  16. —  Effects  of  resolution  on  halo  density  profiles.  The  profiles  of  4  halos  taken  from  a  ACDM  run  with 
^  2h~^  kpc  resolution  (solid  lines)  and  a  run  with  ~  7/i”^  kpc  resolution  (broken  lines)  are  plotted  down  to 
the  resolution  limit. 


